Topic:Temporal Convolutional Networks
What is Temporal Convolutional Networks? Temporal convolutional networks (TCNs) are deep learning models that use 1D convolutions for sequence modeling tasks.
Papers and Code
Sep 10, 2025
Abstract:Isolated Sign Language Recognition (ISLR) is challenged by gestures that are morphologically similar yet semantically distinct, a problem rooted in the complex interplay between hand shape and motion trajectory. Existing methods, often relying on a single reference frame, struggle to resolve this geometric ambiguity. This paper introduces Dual-SignLanguageNet (DSLNet), a dual-reference, dual-stream architecture that decouples and models gesture morphology and trajectory in separate, complementary coordinate systems. Our approach utilizes a wrist-centric frame for view-invariant shape analysis and a facial-centric frame for context-aware trajectory modeling. These streams are processed by specialized networks-a topology-aware graph convolution for shape and a Finsler geometry-based encoder for trajectory-and are integrated via a geometry-driven optimal transport fusion mechanism. DSLNet sets a new state-of-the-art, achieving 93.70%, 89.97% and 99.79% accuracy on the challenging WLASL-100, WLASL-300 and LSA64 datasets, respectively, with significantly fewer parameters than competing models.
* 5 pages, 3 figures, ICASSP
Via

Sep 11, 2025
Abstract:Detecting incipient slip enables early intervention to prevent object slippage and enhance robotic manipulation safety. However, deploying such systems on edge platforms remains challenging, particularly due to energy constraints. This work presents a neuromorphic tactile sensing system based on the NeuroTac sensor with an extruding papillae-based skin and a spiking convolutional neural network (SCNN) for slip-state classification. The SCNN model achieves 94.33% classification accuracy across three classes (no slip, incipient slip, and gross slip) in slip conditions induced by sensor motion. Under the dynamic gravity-induced slip validation conditions, after temporal smoothing of the SCNN's final-layer spike counts, the system detects incipient slip at least 360 ms prior to gross slip across all trials, consistently identifying incipient slip before gross slip occurs. These results demonstrate that this neuromorphic system has stable and responsive incipient slip detection capability.
* 7 pages, 12 figures. Submitted to IEEE Robotics and Automation
Letters (RAL), under review
Via

Sep 09, 2025
Abstract:This study applies a range of forecasting techniques,including ARIMA, Prophet, Long Short Term Memory networks (LSTM), Temporal Convolutional Networks (TCN), and XGBoost, to model and predict Russian equipment losses during the ongoing war in Ukraine. Drawing on daily and monthly open-source intelligence (OSINT) data from WarSpotting, we aim to assess trends in attrition, evaluate model performance, and estimate future loss patterns through the end of 2025. Our findings show that deep learning models, particularly TCN and LSTM, produce stable and consistent forecasts, especially under conditions of high temporal granularity. By comparing different model architectures and input structures, this study highlights the importance of ensemble forecasting in conflict modeling, and the value of publicly available OSINT data in quantifying material degradation over time.
Via

Sep 10, 2025
Abstract:Tactile and kinesthetic perceptions are crucial for human dexterous manipulation, enabling reliable grasping of objects via proprioceptive sensorimotor integration. For robotic hands, even though acquiring such tactile and kinesthetic feedback is feasible, establishing a direct mapping from this sensory feedback to motor actions remains challenging. In this paper, we propose a novel glove-mediated tactile-kinematic perception-prediction framework for grasp skill transfer from human intuitive and natural operation to robotic execution based on imitation learning, and its effectiveness is validated through generalized grasping tasks, including those involving deformable objects. Firstly, we integrate a data glove to capture tactile and kinesthetic data at the joint level. The glove is adaptable for both human and robotic hands, allowing data collection from natural human hand demonstrations across different scenarios. It ensures consistency in the raw data format, enabling evaluation of grasping for both human and robotic hands. Secondly, we establish a unified representation of multi-modal inputs based on graph structures with polar coordinates. We explicitly integrate the morphological differences into the designed representation, enhancing the compatibility across different demonstrators and robotic hands. Furthermore, we introduce the Tactile-Kinesthetic Spatio-Temporal Graph Networks (TK-STGN), which leverage multidimensional subgraph convolutions and attention-based LSTM layers to extract spatio-temporal features from graph inputs to predict node-based states for each hand joint. These predictions are then mapped to final commands through a force-position hybrid mapping.
* 20 pages, 19 figures, accepted by IEEE Transactions on Robotics
Via

Sep 10, 2025
Abstract:Cardiovascular diseases (CVDs) remain a leading cause of mortality worldwide, underscoring the importance of accurate and scalable diagnostic systems. Electrocardiogram (ECG) analysis is central to detecting cardiac abnormalities, yet challenges such as noise, class imbalance, and dataset heterogeneity limit current methods. To address these issues, we propose FoundationalECGNet, a foundational framework for automated ECG classification. The model integrates a dual-stage denoising by Morlet and Daubechies wavelets transformation, Convolutional Block Attention Module (CBAM), Graph Attention Networks (GAT), and Time Series Transformers (TST) to jointly capture spatial and temporal dependencies in multi-channel ECG signals. FoundationalECGNet first distinguishes between Normal and Abnormal ECG signals, and then classifies the Abnormal signals into one of five cardiac conditions: Arrhythmias, Conduction Disorders, Myocardial Infarction, QT Abnormalities, or Hypertrophy. Across multiple datasets, the model achieves a 99% F1-score for Normal vs. Abnormal classification and shows state-of-the-art performance in multi-class disease detection, including a 99% F1-score for Conduction Disorders and Hypertrophy, as well as a 98.9% F1-score for Arrhythmias. Additionally, the model provides risk level estimations to facilitate clinical decision-making. In conclusion, FoundationalECGNet represents a scalable, interpretable, and generalizable solution for automated ECG analysis, with the potential to improve diagnostic precision and patient outcomes in healthcare settings. We'll share the code after acceptance.
Via

Sep 09, 2025
Abstract:Multivariate time series forecasting (MTSF) often faces challenges from missing variables, which hinder conventional spatial-temporal graph neural networks in modeling inter-variable correlations. While GinAR addresses variable missing using attention-based imputation and adaptive graph learning for the first time, it lacks interpretability and fails to capture more latent temporal patterns due to its simple recursive units (RUs). To overcome these limitations, we propose the Interpretable Bidirectional-modeling Network (IBN), integrating Uncertainty-Aware Interpolation (UAI) and Gaussian kernel-based Graph Convolution (GGCN). IBN estimates the uncertainty of reconstructed values using MC Dropout and applies an uncertainty-weighted strategy to mitigate high-risk reconstructions. GGCN explicitly models spatial correlations among variables, while a bidirectional RU enhances temporal dependency modeling. Extensive experiments show that IBN achieves state-of-the-art forecasting performance under various missing-rate scenarios, providing a more reliable and interpretable framework for MTSF with missing variables. Code is available at: https://github.com/zhangth1211/NICLab-IBN.
Via

Sep 08, 2025
Abstract:The evolution of the traditional power grid into the "smart grid" has resulted in a fundamental shift in energy management, which allows the integration of renewable energy sources with modern communication technology. However, this interconnection has increased smart grids' vulnerability to attackers, which might result in privacy breaches, operational interruptions, and massive outages. The SCADA-based smart grid protocols are critical for real-time data collection and control, but they are vulnerable to attacks like unauthorized access and denial of service (DoS). This research proposes a hybrid deep learning-based Intrusion Detection System (IDS) intended to improve the cybersecurity of smart grids. The suggested model takes advantage of Convolutional Neural Networks' (CNN) feature extraction capabilities as well as Long Short-Term Memory (LSTM) networks' temporal pattern recognition skills. DNP3 and IEC104 intrusion detection datasets are employed to train and test our CNN-LSTM model to recognize and classify the potential cyber threats. Compared to other deep learning approaches, the results demonstrate considerable improvements in accuracy, precision, recall, and F1-score, with a detection accuracy of 99.70%.
Via

Sep 05, 2025
Abstract:Blinks in electroencephalography (EEG) are often treated as unwanted artifacts. However, recent studies have demonstrated that blink rate and its variability are important physiological markers to monitor cognitive load, attention, and potential neurological disorders. This paper addresses the critical task of accurate blink detection by evaluating various deep learning models for segmenting EEG signals into involuntary blinks and non-blinks. We present a pipeline for blink detection using 1, 3, or 5 frontal EEG electrodes. The problem is formulated as a sequence-to-sequence task and tested on various deep learning architectures including standard recurrent neural networks, convolutional neural networks (both standard and depth-wise), temporal convolutional networks (TCN), transformer-based models, and hybrid architectures. The models were trained on raw EEG signals with minimal pre-processing. Training and testing was carried out on a public dataset of 31 subjects collected at UCSD. This dataset consisted of 15 healthy participants and 16 patients with Parkinson's disease allowing us to verify the model's robustness to tremor. Out of all models, CNN-RNN hybrid model consistently outperformed other models and achieved the best blink detection accuracy of 93.8%, 95.4% and 95.8% with 1, 3, and 5 channels in the healthy cohort and correspondingly 73.8%, 75.4% and 75.8% in patients with PD. The paper compares neural networks for the task of segmenting EEG recordings to involuntary blinks and no blinks allowing for computing blink rate and other statistics.
Via

Sep 05, 2025
Abstract:Speech denoising (SD) is an important task of many, if not all, modern signal processing chains used in devices and for everyday-life applications. While there are many published and powerful deep neural network (DNN)-based methods for SD, few are optimized for resource-constrained platforms such as mobile devices. Additionally, most DNN-based methods for SD are not focusing on full-band (FB) signals, i.e. having 48 kHz sampling rate, and/or low latency cases. In this paper we present a causal, low latency, and lightweight DNN-based method for full-band SD, leveraging both short and long temporal patterns. The method is based on a modified UNet architecture employing look-back frames, temporal spanning of convolutional kernels, and recurrent neural networks for exploiting short and long temporal patterns in the signal and estimated denoising mask. The DNN operates on a causal frame-by-frame basis taking as an input the STFT magnitude, utilizes inverted bottlenecks inspired by MobileNet, employs causal instance normalization for channel-wise normalization, and achieves a real-time factor below 0.02 when deployed on a modern mobile phone. The proposed method is evaluated using established speech denoising metrics and publicly available datasets, demonstrating its effectiveness in achieving an (SI-)SDR value that outperforms existing FB and low latency SD methods.
* Accepted for publication in Proceedings of the 2025 IEEE 27th
International Workshop on Multimedia Signal Processing (MMSP)
Via

Sep 04, 2025
Abstract:Dynamic facial expression recognition (DFER) faces significant challenges due to long-tailed category distributions and complexity of spatio-temporal feature modeling. While existing deep learning-based methods have improved DFER performance, they often fail to address these issues, resulting in severe model induction bias. To overcome these limitations, we propose a novel multi-instance learning framework called MICACL, which integrates spatio-temporal dependency modeling and long-tailed contrastive learning optimization. Specifically, we design the Graph-Enhanced Instance Interaction Module (GEIIM) to capture intricate spatio-temporal between adjacent instances relationships through adaptive adjacency matrices and multiscale convolutions. To enhance instance-level feature aggregation, we develop the Weighted Instance Aggregation Network (WIAN), which dynamically assigns weights based on instance importance. Furthermore, we introduce a Multiscale Category-aware Contrastive Learning (MCCL) strategy to balance training between major and minor categories. Extensive experiments on in-the-wild datasets (i.e., DFEW and FERV39k) demonstrate that MICACL achieves state-of-the-art performance with superior robustness and generalization.
* Accepted by IEEE ISPA2025
Via
